TOP SECRET 



■ ^ ■ Communications Security 
Establishment Canada 




Centre de la securite 

des telecommunications Canada 



IP Profiling Analytics 
& Mission Impacts 




Tradecraft Developer 

CSEC - Network Analysis Centre 

May 10, 2012 



Canada 



SIGINT 








TOP SECRET 



Example IP Profile Problem 

Target appears on IP address, wish to understand 
network context more fully 

Example Quova look-up & response for 

Lat. 60.00 Long: -95.00 (in frozen tundra W. of Hudson Bay) 

City: unknown 
Country: Canada, 

Operator: Bell Canada, Sympatico 



Issues with IP look-up data: 

is it actually revealing, or is it opaque 
is the data even current, or is it out-of-date 
was the data ever accurate in the first place 
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Objectives 

Develop new analytics to provide richer contextual 
data about a network address 

Apply analytics against Tipping & Cueing objectives 

Build upon artefact of techniques to develop new 
needle-in-a-haystack analytic - contact chaining 
across air-gaps 
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Analytic Concept - Start with Travel Node 



Begin with single seed Wi-Fi IP address of inti, airport 




/ 



Assemble set of user IDs seen on network address 
over two weeks 
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Profiling Travel Nodes - Next Step 



Follow IDs backward and forward in recent time 



Earlier IP clusters of: Later IP clusters of: 

- local hotels - other inti, airports 

. domestic airports - domestic airports 

- local transportation hubs - major inti, hotels 

. local internet cafes - etc. 
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IP Hopping Forward in Time 

Follow IDs forward in time to 
next IP & note delta time 



Next IP sorted 1 Hr. 

by most popular: < 1 — 




2 Hr. 3 Hr. 4 Hr. 5 Hr. A time 



Many clusters will resolve to other Airports! 

Can then take seeds from these airports and 
repeat to cover whole world 




Ditto for going backward in time, can uncover 
roaming infrastructure of host city: hotels, 
conference centers, Wi-Fi hotspots etc. 
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Data Reality 

The analytic produced excellent profiles, but was more 
complex than initial concept suggests 

Data had limited aperture - Canadian Special Source 

major CDN ISPs team with US email majors, losing travel coverage 



Behaviour at airports 

little lingering on arrival; arrivals using phones, not WiFi 

still, some Wi-Fi use when waiting for connecting flight/baggage 

different terminals: domestic/international; also private lounges 



Very many airports and hotels served by large Boingo 
private network 

not seen in aperture; traffic seems to return via local Akamai node 
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Tradecraft Development Data Set 



Have two weeks worth of ID-IP data from Canadian Special 
Source - 



Had program access to Quova dataset connecting into Atlas 
database 



Had seed knowledge of a single Canadian Airport WiFi IP 
address 
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Hop Geo Profile From CDN Airport Inti. Terminal 



Long Longitude scale is non-linear 
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Profiled/seed IP location: Square = geographic location 

Hopped-to IP location: ^ine height = numbers of unique hopped-to IPs at location 

Plot of where else IDs seen at seed IP have been seen in two weeks 

Plot shows most hopped to IPs are nearby - confirming reported seed geo data 
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Effect of Invalid Geo Information 



Long Longitude scale is non-linear 
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Geo incongruence: 
displacement of seed location 
from distrit ution center 
strongly su< igests data error 
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Profiled/seed IP location: [Square = geographic location 

Hopped-to IP location: ^ine height = numbers of unique hopped-to IPs at location 



A 

Effect of invalid seed geo information readily apparent 
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Hop-Out Destinations Seen 

Other domestic a rports 

Other terminals, lounges, transport hubs 

Hotels in many cities 

Mobile gateways in many cities 



Etc. 
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"Discovered" Other CDN Airport IP 
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Domestic terminal 

Closeness of majority of hopped-to IPs confirms geo data 
But, domestic airport can also look like a busy hotel ... 



horizontal line shows presence pattern of one ID, sorted by order of appearance 
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IDs Presence Profile at "Discovered" Airport 
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Time/days 

Dominant pattern is each ID is seen briefly, just once - as expected 
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Profiles of Discovered Hotel 



Many IDs present over a few days 
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Time/days -> 
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Profiles of Discovered Enterprise 



Time/days -> 



Regular temporal presence (M-F) with local geographic span 
Contrasts well against travel/roaming nodes 
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Discovered Coffee Shop, Library 



Coffee shop 



Time/days -> 



Library 



• __ 




• 




j ■ 




• • 


• • • 










• 


• 






1 












• 










• _ , 


• 














i 




















•• __ • 
^,7 


• 


“ 














'TL ’ 


• 














X. 


a 
















" 'V * 



Time/days 



Similar patterns of mixed temporal & local geographic presence 
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Discovered Wireless Gateway 



Wireless gateway not unlike a hotel, except ... 



Time/days 
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Partial Range Profile of Wireless Gateway 



500 

Number of 400 
IDs seen 

on each IP 300 
200 
100 
o 

Individual IP number in range of 8 




■ ID Total on IP 
□ Common IDs 



For wireless gateway, range behaviour is revealing 

Most IDs seen on an IP are also scattered across entire range 

ID totals & traffic across full range is very high 
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Mission Impact of IP Profiling 

Tipping and Cueing Task Force (TCTF) 

a 5-Eyes effort to enable the SIGINT system to provide real-time alerts of 
events of interest 

alert to: target country location changes, webmail logins with time-limited 
cookies etc. 



Targets/Enemies still target air travel and hotels 

airlines: shoe/underwear/printer bombs ... 

hotels: Mumbai, Kabul, Jakarta, Amman, Islamabad, Egyptian Sinai ... 



Analytic can hop-sweep through IP address space to identify 
set of IP addresses for hotels and airports 

detecting target presence within set will trigger an urgent alert 
aim to productize analytics to reliably produce set of IPs for alerting 
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IP Profiling Summary 

Different categories of IP ownership/use show distinct 
characteristics 

airports, hotels, coffee shops, enterprises, wireless gateways etc. 
clear characteristics enable formal modeling developments 

clear identification of hotels and airports enables critical Tipping & Cueing 
tradecraft 



Geo-hop profile can confirm/refute IP geo look-up 
information 

later could fold-in time deltas for enhanced modeling 



Can "sweep" a region/city for roaming access points to IP 
networks 

leads to a new needle-in-a-haystack analytic ... 
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Tradecraft Problem Statement 

A kidnapper based in a rural area travels to an urban area to 
make ransom calls 

can't risk bringing attention to low-population rural area 
won't use phone for any other comms (or uses payphones ...) 



Assumption: He has another device that accesses IP 
networks from public access points 

having a device isn't necessary, could use internet cafes, libraries etc. 
he is also assumed to use IP access around the time of ransom calls 



Question: Knowing the time of the ransom calls can we 
discover the kidnapper's IP ID/device 

"contact chain" across air-gap (not a correlation of selectors) 
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Solution Outline 

With earlier IP profiling analytics, we can "sweep" a 
city/region to discover and determine public accesses 

We can then select which IP network IDs are seen as active 
in all times surrounding the known ransom calls 

reduce set to a shortlist 



Then we examine the reduced set of IP network IDs and 
eliminate baseline heavy users in the area that fall into the 
set intersection just because they are always active 

that is, eliminate those that are highly active outside the times of the ransom 
calls 

hopefully leaves only the one needle from the haystack 
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First Proof-of-Concept 

Swept a modest size city and discovered two high traffic 
public access ranges with >300,000 active IDs over 2 weeks 

used for initial expediency due to computational intensity 



Presumed that there were 3 ransom calls, each 50 hours 
apart during daytime, looked for IDs within 1 Hr of calls 

reduce large set to a shortlist of just 19 IP network IDs 



Examined activity level of 19 IP network IDs - how many 
presences each had in 1 Hr slots over two weeks 

main worry as the computation was running: there would be a lot of IDs that 
showed just a handful of appearances: e.g. 3, 4, 5 instances 



TOP SECRET 



ID Presence of Shortlist 



Each horizontal line shows presence of ID overtime/hour-slots 
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Time/hour-slots -> 



Happy result: least active ID had appearances in 40 hour-slots! 

Thus could eliminate all, leaving just the kidnapper (if he was there) 
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Big-Data Computational Challenge 

All the previous analytics, while successful experimentally, 
ran much too slowly to allow for practical productization 

CARE: Collaborative Analytics Research Environment 

a big-data system being trialed at CSEC (with NSA launch assist) 
non-extraordinary hardware 

minimal impedance between memory, storage and processors 
highly optimized, in-memory database capabilities 
columnar storage, high performance vector functional runtime 
powerful but challenging programming language (derived from APL) 



Result of first experiments with CARE: game-changing 

run-time for hop-profiles reduced from 2+ Hrs to several seconds 
allows for tra dec raft to be profitably productized 
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Overall Summary 

IP profiling showing terrific value 

significant analytic asset for IP networks and target mobility 
enables critical capability within Tipping & Cueing Task force 
working to productize on powerful new computational platform 
broader SSO accesses/apertures coming online at CSEC 
look to formalize models & fold-in timing deltas 



A new needle-in-a-haystack analytic is viable: contact 
chaining across air-gaps 

enabled by sweep capability of IP profiling 

should test further to understand robustness with respect to loosening 
assumptions of target behaviour 

beyond kidnapping, tradecraft could also be used for any target that makes 
occasional forays into other cities/regions 
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^^■Tradecraft Studio Example 




Possible route for productizing analytics 





